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Abstract — VLIW stands for Very Long Instruction Word. This 
Processor Architecture is based on parallel processing in which 
more than one instructions are executed in parallel. This 
architecture is used to increase the instruction throughput. So 
this is the base of the modern Superscalar Processors. Basically 
VLIW is a RISC Processor. The difference is it contains long 
instruction as compared to RISC. During the execution of the 
program the operands are stored in the General Purpose 
Register File. Register file is the combination of registers. 
Depending upon the processor architecture the number of 
registers inside the register file can be varies. Here the design of 
64 bit decode stage. This stage of the pipeline decodes the 
instruction fetched by the fetch stage. The decode stage also 
fetches register data from the register file and register the 
operand is transfer is decided by the five bit address. Now to 
generate the gate level netlist Synthesis is done on Xilinx ISE 13.1 
by taking Virtex 4 FPGA with 4vfxl2sf363 package with speed 
grade -12.After the synthesis the total memory usage is 200332 
kilobytes and the number of bonded IOBs are 435. Decode stage 
are synthesized and targeted for Xilinx Virtex 4 FPGA and the 
results calculated for 64-bit decode stage improve the speed as 
compared to previous work done. 

Keywords- VLIW; Decode Stage; VHDL; Synthesis; Synopsys 
Tools. 

I. INTRODUCTION 

The VLIW stands for Very Long Instruction Word. This 
Processor Architecture is based on parallel processing in which 
more than one instructions are executed in parallel. This 
architecture is used to increase the instruction throughput. So 
this is the base of the modern Superscalar Processors. Basically 
VLIW is a RISC Processor. (Field programmable gate array) is 
widely used in IC design as an easy means of prototyping and 
verifying a functional design without the need for fabrication. 
It is also used in system design that does not have large volume 
to justify for fabrication. Large densities FPGA allowing for 
millions of gates are easily available and allows for large and 
complex designs to use FPGAs. With large density FPGA, it is 
possible to design complex superscalar microprocessor core 
using FPGAs. This method eases functionality verification of 
the microprocessor core. In order to increase the MIPS of a 
superscalar microprocessor implemented on FPGA), one 
possibility is to increase the parallel pipes in the 
microprocessor to allow for better instruction level parallelism. 
With used Synopsys Verilog Compiler Simulator is a tool from 
Synopsys specifically designed to simulate. VLIW stands for 
Very Long Instruction Word. This Processor Architecture is 
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based on parallel processing in which more than one 
instruction are executed in parallel. This architecture is used to 
increase the instruction throughput. So this is the base of the 
modern Superscalar Processors. Basically VLIW is a RISC 
Processor. The difference is it contains long instruction as 
compared to RISC. [2]. 

n. Implementing decode unit Mechanism of a 
Superscalar Pipeline Microprocessor on FPGA 

A. The decode stage of all 3 parallel pipes are also designed 
with register bypass mechanism to cater for all cases of 
instruction dependency [3]. For an n (n = 3 to 6) pipe 
superscalar pipeline microprocessor, the register bypass 
mechanism must cater for a total ofy number of conditions 
that require register bypassing [3]. 

B. Intra-pipe register bypass conditions = n (n+2) 

C. Inter-pipe register bypass conditions = n4+n2-2n 

D. Total conditions = y = n4+2n2 = 81 + 15 = 99 

Register bypass logic is implemented for all 15 
conditions of intra pipe and 84 conditions of inter pipe bypass, 
resulting in a total of 99 bypass conditions. [1] 

Note: To ease understanding on the RTL code of the 
decode module, only partial bypassing logic is implemented. 
For load instructions, a written register value can only be used 
for instruction dependency after 2 clocks. For other 
instructions, a written register value can only be used for 
instruction dependency after 1 clock for intrapipe bypass and 2 
clocks for interpipe bypass. 

In paper [4] have used the 32-bit VLIW microprocessor 
.But in this paper have to used with 3 operations per VLIW 
instruction (using a customized instruction set consisting of 
logical and Boolean operations) and test vehicle for study on 
the decode stage. The VLIW microprocessor is a 4 stage 
pipeline, 3 parallel pipes superscalar VLIW microprocessor. 
VLIW is chosen as opposed to CISC/RISC due to the easy of 
scalability of a VLIW microprocessor. The VLIW 
microprocessor with its decode stage mechanism for 3 parallel 
operations per VLIW instruction is design to simulated on 
Modelsim SE 10.1 tool and then synthesized by selecting 
device "4vfxl2sG63" and verified on FPGA board spartan3E. 
FPGA using combinational and sequential logic (using 
Modelsim and Xilinx). It is then analyzed for delay, logic 
element usage and power consumption (implemented for 
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3/4/5/6 parallel pipes). spartan3E, XC3S500E-4FG320 FPGA 
is used as it needs to have adequate elements and enough 
usable IO pins for implementation of the VLIW 
microprocessor core. The implemented microprocessor core is 
a 4 stage pipeline, 3 parallel pipes superscalar VLIW 
microprocessor [5]. The microprocessor core is designed with a 
shared register file with 16 registers accessible by all 3 pipes. 
Each register's width is the same as that of the data bus. 
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Figure 1 . Top Level Block Diagram of Superscalar Pipeline VLIW 
Microprocessor. [1] 

III. IMPLEMENTATION OF LARGE DATA BUS SIZE 
VLIW MICROPROCESSOR ON FPGA 

A 64-bit custom instruction set VLIW Microprocessor core 
is implemented on FPGA board Spartan 3E ("XC3S500E- 
4FG320") using the operations of arithmetic, logic, load, read 
and compare [6, 7, 8] creating a minimal customized 
instruction set of 16 instructions: 

1. nop 

2. add 

3. sub 

4. mul 

5. load 

6. move 

7. Read 

8. compare 

9. xor 

10. nand 

1 1 . nor 

12. not 

13. shift left 

14. shift right 

15. barrel shift left 

16. barrel shift right 

IV. EXPERIMENTAL RESULTS SIMULATION 

To check the functionality of the decode stage simulation 
is done on Modelsim 6.4a.Now according to the waveform 64 
bit data is execution in decode stage. 



Final Results for 64-Bit decode stage 
Microprocessor on Virtex 4 and Design Compiler. 

TABLE I. 



VLIW 



Spartan 3E (xc3s500E) 


No. of Slices 


5272/5472 (96%) 


No. of slice flip flops 


399/10944 (3%) 


No. of 4 input LUTs 


10225/10944 (93%) 


Minimum Period 


3.806ns 


TABLE H". 


Combinational Area 


722393.812500 


Noncombinational area 


22157.972656 


Total cell area 


744508.625000 


Total Dynamic Power 


78.5261 mW 


Cell Leakage Power 


226.9518 nW 



V. CONCLUSION 

The hardware implementation of 64 bit decode stage VLIW 
microprocessor. When the amount of parallel pipes increases, it 
allows for more parallelism. However, the amount of register 
Bypass conditions also increases. The increase for a n (n=3 to 
6) parallel pipe is represented by the expression n 4 + 2n 2 . This 
means that the amount of logic required to implement the 
register bypass conditions also increases. In this study, the 
register bypass logic is implemented for a 3/4/5/6 parallel 
operations per VLIW instruction on FPGA spartan3E 
"XC3S500E-4FG320". For each implementation speed and 
low power. Table 1,2. The results show a relatively increase. 

REFERENCES 

[1] Weng Fook Lee, Azrul Halim, Nor Hisham, Yap Vooi Voon, Lo Hai 
Hiung, Patrick Sebastian. "Implementation Results on Register Bypass 
Conditions of an n-Parallel Pipes Superscalar Pipeline Microprocessor 
Core on FPGA".2007 

[2] A.K Jones, R Hoare, D. Kusic, J Fazekas and J.Foster, "An FPGA based 
VLIW Processor with custom hardware execution," in Proc. ACM 
FPGA Synopsys, Monterey, CA 2005. 

[3] Weng Fook Lee and Ali Yeon Md Shakaff "implementing a large Data 
Bus VLIW Microprocessor" Emerald Systems Design Center, Kompleks 
Sri Sg Nibong, Bayan Lepas, 11900, Penang, Malaysia School of 
Computer and Communication Engineering, University Malaysia Perlis, 
American journal of applied science, 2008. 

[4] Weng Fook Lee, Azrul Halim, Nor Hisham, Yap Vooi Voon, Lo Hai 
Hiung, Patrick Sebastian. Implementation results on register bypass 
conditions of an n-parallel pipes superscalar pipeline microprocessor 
core on FPGA. Proceedings of 10th Euro Micro Conference on Digital 
System Design, WIP, SEA-Publications SEA-SR-16, July 2007-11-23. 
ISBN 978-3-902457-16-5. 2007. 

[5] John L Hennessy and David A. Patterson, 2003. Computer Organization 
and Design: The Hardware/Software Interface. Morgan Kaufmann. 

[6] Weng Fook Lee, 2000. VHDL Coding and Logic Synthesis With 
Synopsys. Academic Press Publication. 

[7] IBM Corp. 2000. PowerPC TM Microprocessor Family: The 
Programming Environments for 32-Bit Microprocessors. 

[8] Intel Corp, 1994. i960® Jx Microprocessor Instruction Set and Register. 



IJTEL II ISSN:23 19-2135 



84 II www.ijtel.org 



International Journal of Technological Exploration and Learning (IJTEL) 
Volume 2 Issue 1 (February 2013) 













■ 




^.lopJbjiWdi 



>i_bpJt_ r ff.Tfa(idi'jppi 
%1> .top t.'sjftaddi'jpp] 









■■! 


■ 


= 








- 


= 


ll 


■MITCH "TOTTnasEMffli 


^^^^ 'V. T' -i ' ■ 




1'' M 










ITT > 





^M'_tep_ib_ftj^'4"_bjp j*V^2* jni^f l 
^t.Sp.6jU^i.Hp.iiHifl*.rstp«3 
vi ■. top t '(f^ top mt'.'a nstpp] 

,*.lop.*. , (**.l»."«itejWP«! 




[v j'v HO! HO)! 



W"^~(ITP|— .... 




^^^^^^^ , ^^^"^^ | ^^^^^VJ,|.|. , . , .|.|, | . , .'. | . | . | . | . , .^|. | . , .^u. | .|.'l 



| uiYiYinj ]iTTJJ^TiTiYh j i i'iJdJ|Ii iVi_>'^ r.^JB 11 1 1111 "['J 




Figure 2. Simulation results of 64-bit decode stage VLIW Processor. 
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